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ASYMPTOTIC THEORY OF SEMIPARAMETRIC Z-ESTIMATORS 
FOR STOCHASTIC PROCESSES WITH APPLICATIONS TO 
ERGODIC DIFFUSIONS AND TIME SERIES 

By Yoichi Nishiyama 

Institute of Statistical Mathematics 

This paper generalizes a part of the theory of Z-estimation which 
has been developed mainly in the context of modern empirical pro- 
cesses to the case of stochastic processes, typically, semimartingales. 
We present a general theorem to derive the asymptotic behavior of the 
solution to an estimating equation 6 ~^ 'i'n{d, hn) — with an abstract 
nuisance parameter h when the compensator of vf^ is random. As its 
application, we consider the estimation problem in an ergodic diffu- 
sion process model where the drift coefficient contains an unknown, 
finite-dimensional parameter 9 and the diffusion coefficient is indexed 
by a nuisance parameter h from an infinite-dimensional space. An ex- 
ample for the nuisance parameter space is a class of smooth functions. 
We establish the asymptotic normality and efficiency of a Z-estimator 
for the drift coefficient. As another application, we present a similar 
result also in an ergodic time series model. 

1. Introduction. Let us begin with stating our motivating example; the 
details are presented in Section 4. Consider the one-dimensional ergodic 
diffusion process X on 7 = r) C M which is a solution to the stochastic 
differential equation (SDE) given by 

(1) Xt = Xo+ f S{X,-e) ds + /* a{Xs; h) dWs, 

Jo Jo 

where s ^ Ws is a standard Brownian motion. Here, we consider a d- 
dimensional parametric family {S{-;6);9 G 0} for the drift coefficient in- 
dexed by a compact subset @ of M'^, and a possibly infinite-dimensional 
"parametric" family {a'^(-;h);h£ H} for the diffusion coefficient indexed 
by a (general) totally bounded metric space {H^dn)- We denote by {9o,hQ) 
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the true value of {0,h). Our aim is to estimate when the model is per- 
turbed by the unknown nuisance parameter h. As for the parameter /iq, we 
construct a djf -consistent estimator We prove that the Z-estimator 6n, 
which is a solution to an estimating equation ^n{9,hn) = 0, is asymptoti- 
cally normal and efficient. [We follow van der Vaart and Wellner (1996) for 
the terminology "Z-estimator."] 

There exist a lot of works which treat the estimation problem for the 
drift coefficient. It is well known that when the process X = {Xt)t^[Q^oo) is 
observed continuously on the time interval [0,r], the diffusion coefficient 
may be assumed to be known without loss of generality. (So, we may put 
h = ho.) In such cases, the asymptotic normality and efficiency of the max- 
imum likelihood estimator (MLE) 9x for 9q, as T — > oo, has been already 
established. See, for example, Kutoyants (2004). The MLE 9t is a solution 
to the estimating equation £t(^) = with 

iT{e) = ^ r l\^''l\ [dXt-s{Xu9)dti 

1 Jo a^(Xt;h.o) 

where S denotes the derivative of S with respect to 9. On the other hand, 
when the process X = (-'^^f )ie[o,oo) is observed only at discrete time points 
{0 = tg < t" < • • • < t"}, the diffusion coefficient has to be estimated, too. 
Florens-Zmirou (1989), Yoshida (1992) and Kessler (1997), among others, 
considered such situations when H is a finite- dimensional parameter space, 
and proved the asymptotic efficiency of some estimators 9n for 9o. Our result 
does not include these works as special cases, because we assume a condition, 
which is theoretically strong but practically reasonable, that 

A„ = max - tti\ = o((tn)"^) and ^ oo, 

l<i<n 

which is almost the same as the assumption raA,^ 0. For example, Kessler's 
(1997) assumption nA^ — > for a given p > 2 is weaker than ours. Another 
difference is that the preceding works derive not only the consistency of the 
finite-dimensional estimator hn but also its asymptotic distribution, while 
we prove only the di^-consistency. However, our work is the first attempt 
to propose an asymptotically efficient estimator for 9q when the nui- 
sance parameter h belongs to an infinite- dimensional space {H^dn)- Here, 
by "asymptotically efficient" we mean that the rescaled residual y^(0„ — ^o) 
has the same asymptotic distribution as the continuous observation case, 
with h = ho being known, which has been shown to be optimal in the frame- 
work of local asymptotic normality theory. 

We approach this problem by using the approximation of It (9) given by 

1 " S(Xtn ■9) 
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where /iq has been replaced by the unknown parameter h. Its "compensator" 
is 



1 " s(Xtr^ ■,e) r 

in the sense that the difference — is the terminal variable 

of a martingale. The key points are to show the weak convergence of the 
rescaled random fields {6, h) rn(^„(6', h) — ^'n(^, h)) for some constant r„ 
tending to oo, and to show the differentiability of {9,h) -^^ ^„(6',/i) around 
(^O)^o)- Roughly speaking, our main result asserts that if we assume /i i— > 
a'^{-;h) is Lipschitz continuous with respect to du, that the metric entropy 
condition is satisfied, 



j ^J\ogN{H, dH,e)de < oo, 
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where N{H, dn, e) is e-covering number of H with respect to dn, and that we 
have a d/^-consistent estimator /i„ for /iq , then we can derive the asymptotic 
distribution of rn{On — Go)- The consistency of hn should be established 
separately. 

This approach is based on a new theory for general Z-estimators with 
infinite-dimensional nuisance parameters presented in Section 2, although 
its proof is just an adaptation of that of Chapter 3.3 of van der Vaart 
and Wellner (1996) who considered the case where the compensator is 
neither random nor depending on n. Hopefully, this extension considerably 
enlarges the application fields of van der Vaart and Wellner's theory to 
various stochastic process models. Indeed, we also present a result for time 
series in Section 5, which is briefly introduced below. Kosorok's (2008) new 
book does not seem to cover our examples. 

In Section 5, we will consider an ergodic time series model of the form 

Xi = 5(Xj_i, . . . jXi^p] 9) + a{Xi^i,. . . jXi-q] h)wi, 

where E[wi\J^i-^i] = and E[wf\J-'i^i] = 1. Here, 6 is an estimated parameter 
which belongs to a compact subset of M'^, while h is a nuisance parameter 
from a totally bounded metric space {Hjdn)- In the same way as the dif- 
fusion process case, we present a general result to derive the asymptotic 
normality (and efficiency in some cases) of a Z-estimator for 6q. Although 
there are vast literatures in time series analysis [see, e.g., Taniguchi and 
Kakizawa (2000)] apparently, our result is new. 

The crucial point of our approach is how to show the weak convergence of 
the random fields {6, h) rn{'^n{9, h) — ^n{d, h)). For this purpose, we use 
the general weak convergence theory for £°°-valued martingales established 
by Nishiyama (1996, 1997, 1999, 2000a, 2000b, 2007). The theory is a good 
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marriage between the martingale theory which has a long history [see, e.g., 
Jacod and Shiryaev (1987)] and the modern theory of empirical processes 
[see, e.g., van der Vaart and Wellner (1996)]. 

The organization of the paper is as follows. In Section 2, we present a gen- 
eral theory for Z-estimation with infinite-dimensional nuisance parameter. 
In Section 3, we prepare a uniform law of large numbers for random fields 
with abstract parameter, which is often used in the course of our work. The 
results for the ergodic diffusion process models are presented in Section 4, 
while those for the ergodic time series models are given in Section 5. 

We refer to van der Vaart and Wellner (1996) for the weak convergence 
theory in £°°(T)-space, where £°°(T) is the space of bounded functions on 
a set T. We denote by Cp(T) the space of functions on T which are contin- 
uous with respect to the metric p. We equip both spaces with the uniform 
metric. Given a probability measure P, we denote by P* the corresponding 
outer probability; see van der Vaart and Wellner (1996) for the stochastic 
convergence theory which does not assume the measurability. We denote by 

^ and the convergence in (outer) probability and the weak convergence. 
The limit notation mean in principle that we take the limit as n— > oo. The 
Euclidean metric on M*^ is denoted by || • ||. 

2. General theory for semiparametric Z-estimation. Let two sets G and 

H be given. Let 

"^n-Qx H ^R'^ and : 6 x F ^ E"^ 

be random maps. The latter should be a random "compensator" of the 
former, and in the i.i.d. case it is not random and not depending on re. 
Compare the above setting with that in Chapter 3.3 of van der Vaart and 
Wellner (1996) where ^„ = ^. 

We present a way to derive the asymptotic behaviour of the estimator O^i 
for the parameter 9 £ Q of interest, with help of the estimator hn for the 
nuisance parameter h£ H, which are solutions to the estimating equation 

*n(^n,/in)«0. 

Here, the true values Oq £ Q and £ H are supposed to satisfy 

$„(0o,/io)~O. 

The following theorem extends a special case of Theorem 3.3.1 of van der 
Vaart and Wellner (1996). See also Theorem 5.21 of van der Vaart (1998). 

Theorem 2.1. Let Q be a subset ofW^ with the Euclidean metric \\ ■ \\. 
Let {H, dn) be a semimetric space. Let : G x iJ — > E*^ and \I'„ : G x — > E*^ 
be random maps defined on a probability space {Qni^n,Pn)- (We do not 
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assume any measurahility.) Suppose that there exist a sequence of constants 
r„ I oo, some fixed point {Oq, Hq) and an invertible matrix Vg^^^ho which satisfy 
the following (i) and (ii). 

(i) There exists a neighborhood U C Q x H of {Oq, ho) such that 

r„(^„-$„)^Z in£^{U), 
where almost all paths {9,h) Z{0,h) are continuous with respect to p = 

\\-\\ydH. 

(ii) For given random scQUGTicG {Ofi^hn^j it holds that 

^n{dn,hn) - ^n{0o, ho) - Vg^Mi^n - Oo) = Op^ir'^ + ||^„ - ^oll) 

and that 

\\9n - OoW"^ dniK, ho) = op*{l), /i^) = op* (r"^), 

^n{eo,ho) = op,{r-^). 
Then it holds that 

rn{en-eo)^-v,-},^z{0o,ho). 

To prove the above theorem, we need the following lemma, which is a 
slight generalization of Lemma 19.24 of van der Vaart (1998). 



Lemma 2.2. Let {T,p) be a semimetric space. Suppose that Zn—>-Zin 
£^{T) and that almost all paths of Z are continuous with respect to p. IfT- 
valued random sequence in satisfies p{tn,to) = op*{l) for some nonrandom 
to G T, then Z„(f„) - Z„(to) = op*{l). 

Proof. Let us equip the space i°°(T) x T with the metric || • ||t + 
p, where || • ||t denotes the uniform metric on £°°{T). Define the function 
g:i°°{T) X T ^ M by g{z,t) = z{t) - z{to). Then for any z G Cp(T) and 
t GT, the function g is continuous at {z,t). Indeed, if {zn,tn) {z,t), then 
\\zn — z\\t 0, and thus Zn{tn) = z{tn) + 0(1) z{t), while Zn{to) — > z{to) is 
trivial. 

By assumption, we have {Zn,tn) — > iZ,to) in £°°{T) x T [see, e.g., Theo- 
rem 18.10(v) of van der Vaart (1998)]. Since almost all paths of Z belong to 
Cp{T), by the continuous mapping theorem, 

Znijn) - Zn(to) = g{Zn,tn) g{Z,to) = Z{to) - Z{to) =0. 



The proof is finished. □ 
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Proof of Theorem 2.1. Applying Lemma 2.2 to the £°°(C/)-valued 
random element Z„ = r„(^„ — ^n), we have 

r„(^'n(^n, /in) -^n(^«, ^n)) -r„(^'n(6lo, /lo) -$n(6'0, /lo)) = op. (1). 

Since rn^n{On-,hn) = op*(l), we have 
By the assumption (ii), it holds that 

(2) rnVe^^hoiOn - 9o) = -r„^„(0o, /^o) + op.(l + r„||^„ - 9o\\). 
Now, since 

rnWOn - 0O\\ < ll^e;,\„lkn||l^eo,/^o(^n " ^o)|| 

<Op„(l)+op.(l + r„||^„-0o||) by (2), 
it holds that r„||0„ — 9q\\ = Op*{l). Inserting this to (2), we have 

= -rn{'i>n{Oo,ho) - ^n{Oo,ho)) + Op*{l) 

— > —Z{9o, ho), 
which implies the conclusion. □ 

In Theorem 2.1, both - 6*011 = op. (1)" and "(i/f(/i„, /iq) = op*(l)" are 
assumed. Under some conditions, the former automatically follows from the 
latter, as it is seen in the following theorem. 

Theorem 2.3. Let (G,(ie) and {H^dn) he two semimetric spaces. Let 
: O X iJ — > M'^ be a random map defined on a probability space {Qn,^n, Pn)- 
(We do not assume any measurability.) Let ^ -.Q x H be a nonrandom 

function. Suppose that 

sup \\^n(.0,h)-^{6,h)\\=Op^{l). 

{e,h)eexH 

Suppose also that for some {9q, ho) & Q x H 

inf \\^i0,ho)\\>O ye>0, 
e:de{e,eo)>£ 

and h ~^ ^{0, h) is continuous at /iq uniformly in 9. Then for any random se- 
quence {On,hn) such that '^n{&n-,hn) = Op* {1) and that dH{hn,ho) = op*{l), 
it holds that dQ{9n,0o) = op*(l). 
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Proof. Observe that 

||^(^n,^n)|| < ||'I'(^n,^n) " *„(^n,/in)|| + H^-nl^n, ^n) || 
<Snp\\^{e,h)-^n{e,h)\\ + \\^n{OnX)\\ 

e,h 

Now, for every e > 0, there exist 6,r] > such that > rj for ev- 

ery 6 with dQ{9,9o) > e and every h with dH{h,hQ) < 6. Thus, the event 
{(i@(^„, 6q) > e} is contained in the event hn)\\ > U {dnihn, ho) > 

6}. The outer probabihty of the latter event converges to 0. □ 

3. Uniform law of large numbers. In this section, we give a uniform law 
of large numbers for ergodic processes, under a smoothness assumption. The 
proof is standard, so it is omitted. [See, e.g., Theorem 2.4.1 of van der Vaart 
and Wellner (1996) for the idea, or see Nishiyama (2009).] 

Theorem 3.1. Let {E,£) be a measurable space. Let Q be a set which is 
totally bounded with respect to the semimetric p. Let a family {f{-;9);6 G 0} 
of measurable functions on E be given. Suppose that there exists a measurable 
function K such that 

(3) \f{x-e)- f{x-,e')\<K{x)p{e,e') ye,e'GG. 



(i) Suppose that the E-valued random process {Xt}t&[o,co) ergodic with 
the invariant law p, that is, for any p-integrable function g 

1 p f 

g{Xt)dt^ / g{x)p{dx). 



op,{l) 



T 

If all f{-',0) and K are p-integrable, then 



sup 

6*66 



1 



fiXt;9)dt- / fix;9)p{dx) 



(ii) Suppose that the E-valued random process {-^i}i=i,2,... is ergodic with 
the invariant law p, that is, for any p-integrable function g. 



1 " f 
-y^aiXi)^ I g{x)p{dx). 
nfr[ Je 



If all f{-',0) and K are p-integrable, then 



sup 

See 



1 " r 
-J2f(Xf,e)- / fix;e)pidx] 
nfr{ Je 



:Op.(l). 



Remark. The smoothness assumption (3) can be replaced by "brack- 
eting." See Theorem 2.4.1 of van der Vaart and Wellner (1996). 
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4. Ergodic diffusion processes. 

4.1. Regularity conditions. Let us consider the diffusion process model 
introduced in the first paragraph of Section 1. We shall list up some con- 
ditions. We suppose that there exists a parametric family of d-dimensional 
vector-valued functions {S{-;0);9 € 0} on / which satisfies the following 
conditions. Typically, they may be considered to be the derivatives of S{-;6) 
with respect to 6, that is, S{-;6) = {-^S{-;6), . . . , -^S{-;6))^ . The function 
A appearing in Al and A3 may be chosen to be common without loss of 
generality. 

Al. G is a compact subset of M'^. There exists a measurable function A 
on / such that at the true 6q £ Q, 

S{x; 9) - S{x- Bo) = S{x; Oof {6 - Bo) + A(x)e(x; B, Bq), 

where sup^.^^ \€{x;9,9o)\ = o{\\B — Bq\\) as 9 ^ 9q. 
A2. There exists a constant K >{) such that 

sviv\S{x]B)- S{x']B)\ <K\x-x'\] 
eee 

sup\\S{x;B) - S{x';B)\\ < K\x - x'\] 
eee 

sup |o"^(x; h) — o^ix' \ K)\ < K\x — x'\. 

heH 

A3. There exists a measurable function A on / such that 

sup|S(2;;e)| < A(x); 
6»ee 

sup||S(x;e)|| < A(x); 

c := inf inf a'^ix; h) > 0; 
heHxei 

\\S{x;9)- S{x;9)\\ < A{x)\\9-B'\\ y9,9'eQ; 
\a^{x; h) - a2(x; h')\ < A(x) ^^(/i, h') V/i, h' G H. 
A4. suvt^^E{K{Xtf + \Xt\^) 

A5. The process X = {Xt)t^yQ^oo) is ergodic. We denote by ^ the invariant 
measure under the true {Bq, /iq), and we assume that it satisfies A(x)^(l -|- 
\x\)^{dx) < oo. 

A6. The matrix 

TfQ h \ f S{x;Bo)S{x;9o)'^ , . 

I{9o,ho)= / /i(dx) 

J I a^[x;ho) 

is invertible. 



SEMIPARAMETRIC Z-ESTIMATION 



9 



A7. The metric entropy condition for [H^dn) is satisfied: 



j ^logN{H,dH,e)de <oo. 
A8. For every e > 0, 





/ ir'l\ [s{x-,eo)-six-,e)Udx) > 

J I a^{x; ho) 



Remark. The last assumption in A2 imphes that a'^{x;hQ) < C(l + |x|) 
for a constant C > 0. 

To close this subsection, let us discuss the possibility of the choice of the 
nuisance parameter space {H^dn)- 

Example 1 (Parametric model). When [H^dn) is a compact subset 
of a finite-dimensional Euclidean space, the metric entropy condition A7 
is indeed satisfied. So the main restriction is the Lipschitz continuity of 
h^a'^{-;h) in A3. This situation is more general than that in Yoshida 
(1992) and Kessler (1997), although, as announced in Section 1, our result 
does not include theirs. 

Example 2 (The class of smooth functions). Let us consider the 
parametrization a{x;h) = h{x) where h is an element of the class H = Cf^{I) 
defined below. We equip the function space H with the uniform metric || • ||oo 
for which the last requirement in A3 is always fulfilled. To check A7, first 
we consider the case where I is a bounded subset of M, and next we give 
some remarks for the general case. 

We take the material below from Section 2.7.1 of van der Vaart and Well- 
ner (1996). Let / be a bounded, convex subset of M'^. (In the current example 
of one-dimensional diffusions, we are considering the case (? = 1, but for the 
generality we set q to be a general positive integer; see Section 5.) Let a > 
and M > be given, and let a be the greatest integer smaller than a. For 
any vector k = {ki, . . . , kg) of q integers, we define 



where k. = X]i=i ^i- We denote by Cfj{I) the class of functions defined on / 



Qk. 



dx\^ ■ ■ ■ dx^q 



such that 
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logN{C^,{I),\\-\\^,e)<KX{l') 



where the sumprema are taken over all x,y m the interior of / with x 7^ y. 
Then there exists a constant K > depending only on a and q, such that 

'My/" 

where A(/^) is the Lebesgue measure of the set {x: ||x — I\\ < 1}. Hence, 
the metric entropy condition A7 is satisfied if q/{2a) < 1, and therefore our 
theory works. 

When / = R"^, we shall restrict out attention, for example, to the following 
class H of functions on M''. Let Iq be a bounded, convex subset of M'^, and 
we suppose that the restriction of h £ H to Iq belongs to Cm{Io) and that 

(4) sup |/i(x) -/i'(x)| <Lsup |/i(x) -/i'(x)| yh,h'eH, 

xeRi xe/() 

for a constant L > 0. Then both the last condition of A3 and A7 are satis- 
fied. The condition (4) is satisfied if we assume, for example, either of the 
following: 

(i) h is known on Iq; 

(ii) when q = 1 and Iq = [/o,ro], each h is constant on (— cxo,Zo] and on 
[ro,oo). 

Although the examples (i) and (ii) might look restrictive, it should be 
noted that in practice we can choose an arbitrary large Iq. 

Another way to deal with the unbounded case / = M'^ is to consider the 
parametrization 

(T(x;/i)=/iKx)), hGCUlo), 

where Iq being a bounded, convex subset of W and u:W ^ Iq is a fixed 
function. If q' /{2a) < 1, then both the last requirement in A3 and the metric 
entropy condition A7 are satisfied for the uniform metric dfj = \\ ■ \\^. 

Instead of Cfj{I), another possibility of the choice of H which satisfies 
the metric entropy condition for the uniform metric is the Sobolev class; see 
Example 19.10 of van der Vaart (1998). 

4.2. Results. As announced in Section 1, we propose to use the estimat- 
ing function 

whose compensator is 

1 A SiXtr^ ;0) r rt2 

Then we have the following two lemmas. 
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Lemma 4.1. Assume A„ and — > oo. Equip the space Q x H with 
the metric p=\\-\\ Vd/f. Under A1-A5 and A7, ^/t^{^n — ^n) converges 
weakly in Cp{@ x H) to a zero-mean Gaussian process Z with the covariance 

In particular, the random variable Z{6q, /iq) is distributed withj\f{0, 1{Oo, ho)). 

Lemma 4.2. Assume An = o((t")~^) andtn^oo. Under A1-A6, for 
any random sequence {6n,hn) such that ||^n — ^o|| V /iq) = op*(l), it 

holds that 

^n{On, K) - ^nie^M) " ("/(^O, h^Wn " ^o) = Op,{{tir^l^ + - ^qII)- 

Combining these lemmas with Theorem 2.1, and noting also ^n(^O)^o) = 

1/2 

Op(An ) which will be proved by using Lemma 4.5 below, we can conclude 
the following theorem. 

Theorem 4.3. Assume An = o((t;5)-^) and oo. Under A1-A7, for 
any random sequence {0n,hn), such that 

\\0n - 6o\\ = op*(l), dH{hn,ho) = Op*{l) 

and 

the estimator On is asymptotically normal and efficient: 

Vt^i9n-0o)^M{O,I{eo,hor'). 

When A8 is also satisfied, the assumption "\\9n — do\\ = op*(l) " is automat- 
ically satisfied. 

In the above theorem, the only assumption which we cannot check in 
the course of computing the data is the consistency /iq) = op*(l)," 

because it involves the true value Hq of the unknown parameter h £ H . When 
{a'^{-;h);h £ H} is a class of functions a'^{-;h) = h(-) where H is a class of 
smooth functions, one may think that a kernel estimator is a candidate 
for hn- As stated above, in view of the Lipschitz condition of h ^ a'^{-;h) 
(the last condition in A3), it is convenient to consider the consistency with 
respect to the uniform metric. However, to show the consistency of the kernel 
estimator with respect to the uniform metric is a task. Generally speaking, 
showing the consistency for infinite-dimensional parameter is not a trivial 
problem, which should be solved by independent articles. See, for example, 
Hoffmann (2001). Below, we give a general way to show the consistency of 
a least square estimator. 
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Theorem 4.4. Assume A„ ^ and oo. Assume A2-A5 and A7. 
Suppose that 



inf \a\x;h)-a''{x;ho)rnidx)>0 Ve > 

h : dH(h,ho)>e J I 

is satisfied. If the random element hn satisfies An{hn) < iniheH -^nih) + 
op*(l) where 



''n i=l 



I .n .n I 



t/ien holds that dH{hn,h{)) = op*(l). 

4.3. Proofs. Before the proofs, we state a lemma which is weh known. 

Lemma 4.5. Let X he a solution to the SDE (1) for {9,h) = {9o,ho). 
Assume |t" — tf_i\ < 1. 

(i) For any k>2, there exists a constant Ck > 0, depending only on k, 
such that 

E sup iX^-X^n < CfcSupii;{|5(X,;0o)l' + k(X,;Ml'}|ir-ir-il'/' 

= :Dk\t:-tU\^/\ 

provided the right-hand side is finite. 

(ii) For any k>2 and any measurable function f , g, it holds that 

sup Ei\Xt-Xtn f/^\fiXt^^)\\giXt)\) 

<(z5fcitr-tr-ii'/')'/'sup(Ei/(x,)|4)i/4sup(^b(x,)|4)i/4, 

provided the right-hand side is finite. 

Proof. The assertion (i) is well known. (Use Holder's inequality 
and Burkholder-Davis-Gundy's inequality for J.n \S{Xs;9o)\ds and 

i — l 

supjgjjn a {X g', Hq) dWs\ .) The assertion (ii) follows from Holder's 

inequality and (i). □ 

During the proofs, we write 

S{x;e) 



ip{x; 6, h) 



a'^{x] h) 
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which is a d-dimensional vector-valued function. For each component ip^^^ (x; 6, 
h) (j = 1, . . . , d), it holds that 

\iP^^\x;e,h)-7p^^\x';9,h)\ 

\S(^\x;e)-S(^\x';e)\ 



< 



(5) 



(t2(x; h) 



c 



a'^{x;h) a^{x';h) 
— x'\ 



and that 

\i;^^\x;9,h)-ij^^\x;9',h')\ 
\S^^\x;d) - S^i\x;d')\ 



< 



(6) 



< 



< 



cr^(x; h) 
\S^^\x;d) - S^^\x;d')\ 



A(x) |A(x)p 



cj2(x;/i) cj2(x;/i') 



e'll V(ii^(/i,/i')). 



Proof of Lemma 4.1. We apply Theorem 3.4.2 of Nishiyama (2000b) 
[or, see Theorem 3.3 of van der Vaart and van Zanten (2005)] to the terminals 
Mp^'^ of the continuous martingales t Mj^'^''^ given by 

For the finite-dimensional convergence, it is sufficient to show the conver- 
gence of predictable covariation. This is done as follows. 

t'J; ^ Jt" , 



n i=l 



(7) 



/n 



V^(Xt; 0, /i)^(Xt; 0', /i')^<T2(Xt; /iq) + op(l) 



^ / i;{x;e,h)^{x;9',h'fa^ix;ho)n{dx) 



S{x;e)Six;e'f 2, , w. , 
——a {x;ho)n{dx) 



I cr2(x; h)a'^{x; h') 



14 



Y. NISHIYAMA 



i{eo,ho) \i{e,h) = {e',h') = {eQM). 



Here, to show (7), we have used the bound (5) and Lemma 4.5(ii) twice. 

To estabUsh Nishiyama's condition [ME] , let us observe the fohowing fact 
to check the metric entropy condition for the product space Q x H. 

In general, if {D,d) and {E,e) are two semimetric spaces, then the cover- 
ing number of the product space D x E with respect to the maximum semi- 
metric (iVe, namely N{D x E,d\/ e,e), is bounded by N{D, d, e) ■ N{E, e,e). 
To see this claim, let Bi, i = 1, . . . , N{D, d, e) be an e-covering of D, and let 
Cj, j = 1,. . . , N{E,e,£) be an e-covering of E. Then the diameters of the 
sets Bi X Cj C D X E with respect to d V e are smaller than e, thus these 
sets form an e-covering of D x E. The claim has been proved. Consequently, 
the metric entropy condition 



logN{D X E,dV e,£)de <oo 



is satisfied if 



J y^log N{D, d, e)de <oo and J ^JlogN{E,e,e) de < 



oo. 



Now, since is compact with respect to the Euclidean metric, the metric 
entropy condition for G is satisfied. So, with A7 in hands, the metric entropy 
condition for the product space Q x H is satisfied, and it remains only to 
show that the quadratic modulus is bounded in probability; that is, the claim 
that each component of the matrix 



sup 

{e,h)j^{e',h') 



{AT- 



M 



nfi'h' 



yd{h,h')Y 



is bounded in probability. In view of (6), the absolute value of each compo- 
nent of this matrix is bounded by 



1 " rt- 

-Y 



a^[Xs\hQ)ds. 



c 

The expectation of this random valuable is bounded by 



sup \\ E 



■swpJEa'^{Xs]hQ), 



which is 0(1) by A4. Thus, the quadratic modulus is bounded in probability. 
□ 
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Proof of Lemma 4.2. It follows from Lemma 4.5 that uniformly in 

^n{O,h)-^n{0O,ho) 

= 7;rEV'(^*r-i;^'^) L [SiXt;eo)- s{Xt;e)]dt + Op{Al/^) 

= ^ f" ij{Xt;e, h)[S{Xt;eo) - SiXt;e)] dt + Op(Ay2). 
r„ Jo 

The remainder term of this approximation is actually op{{t'^)~^^'^). Fur- 
thermore, it holds for any (possibly random) sequence {6n,hn) converging 
in outer probability to (Sqi^o) that 



1 

/n 



*" iPiXuOn, hn)[S{Xt;eo) - S{Xt-0n)] dt 



1 

- / i;{Xt; On, hn) S {Xt; 9ofdt{9o-9n)+ op, iWOn-eol 



JO 



1 • T 

(8) =-/ ij{Xt;eo,ho)S{Xt;dofdt{eo-err) + op*{\\6n-eo\ 



= -i{eoM){On - eo) + op,{\\en - e^). 

To prove (8) in the above computation, use (6) to show that for every j, k = 

Jo tn Jo 

< i + jl^^'^)(^t;^o)| dt . ||0„ - 9o\\ V /.o) 

JO I c J 

= Op(l)-op.(l) 

= Op.(l). 

The proof is complete. □ 

Proof of Theorem 4.3. By Lemma 4.5, it is easy to see that ^n{0o, ho) 

= op{{tn) ^/^). So the main assertion follows from Theorem 2.1 
with help from Lemmas 4.1 and 4.2. On the other hand, since it follows from 
Lemma 4.5(ii) and Theorem 3.1(i) that supg ||^'„(6',/i) - ^{6,h)\\ = op.(l), 
where 

^{e, h) = 9o) - S{x; 0)Hdx), 
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the assertion that the consistency "||^n — ^o|| = op*(l)" automatically follows 
from A8 is immediate from Theorem 2.3. □ 

Proof of Theorem 4.4. Put 
1 " 

V. 



M{h) 



h) — (t^{x; hQ)\^ fj,{dx) . 



Let us apply Corollary 3.2.3 of van der Vaart and Wellner (1996) to the above 
Mn and M for the given hn which is the solution olAn{hn) < inih^H •A.n{h) + 
op*(l). By Lemma 4.5(ii) and Theorem 3.1(i), it is not difficult to see that 
sup/ig/f \Mn{h) — M{h)\ = op*(l), so it is sufficient to show that Mn{hn) = 
Op* (1). 

Observe that 



1 



-E 



IXtr^ — Xf: 



+11 j.n 



c7\Xt^_^;h) 



i-l\ 



-E 

n j=i 



\Xin — X+ri 



Un 7-n 

ri ''j-il 



2 ^f\Xt 



Xt: 



X {a\Xtr._^M) - 'y\Xt^_^;h))\t^ -t 



+ -Y.\''Hx,. ^-M) - <y\Xt^ ^■M'' I - 1 1 • 

Let us prove that the supremum with respect to h of the absolute value of 
the second term on the right-hand side converges in outer probability to zero 
[say, the claim (a)]. 

Since we have from Ito's formula that 

\Xt^ - Xtr._f = 2 r {Xs- Xt^_JS{Xs;eo) ds 



+ 2 {Xs- Xtn )a{X,; ho) dW^ + / a'{Xs;ho) ds, 

1—1 I— 1 

it is sufficient to show that Ci^n = op(l), supi^izH \C2,n{h)\ =op*(l) and 
C3,n = op(l), where 



1 

i=i 



{Xs - Xt2_^)S{Xs;9o)ds A(Xtn_J, 
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C2,n(M = -E / i^s-Xt^ Ja{Xs;ho)dWs{a\Xtr^ ^■M)-'j\Xr^ ^-h)), 

%=V I — 1 

'^3,n = -E / W\Xs-M)-<y\Xt^ ^■M)\dsK{Xt^ ^). 

By using Lemma 4.5(ii), we easily have ECi>n — > and EC^-n — > 0. On the 
other hand, by using Theorem 3.4.2 of Nishiyama (2000b), it holds that 
C2,n converges weakly to zero in C^^ [H) (recall the argument in the proof 
of Lemma 4.1). Therefore, we have srvp^^H \C2,n{h)\ = op*{l). 

Hence, the claim (a) is true, and we have that An{hn) < inf/ig/f ^n(/i) + 
op*(l) implies that Mn{hn) = op*(l). The proof is finished. □ 

5. Ergodic time series. 

5.1. Model and regularity conditions. Let us consider the time series 
model given by 

Xi = S{Xi-i, . . . ,Xi-q^;6) + (7(Xj_i, . . . ,Xi-q^;h)wi. 

By putting q = qi^ q2 and changing the domains of the functions S and a, 
without loss of generality, we can write 

Xi = S(Xi;9) + a(Xi;h)wi, 

where Xj = {Xi-i, . . . , Xi-q) and S{-;9) and a{-;h) are some measurable 
functions on R'^. For simplicity, we assume that the initial values (Xq,..., 
Xi-q) = (xo, . . . , xi-q) are fixed. 

As for the noise {wi}, we consider the following two cases: 

Case G (Gaussian), {wi} are independently, identically distributed with 
AA(0,1). 

Case M (Martingale). E[wi\Ti-i] = and E[wf \Ti^i] = 1 almost surely, 
where !Fi = a{Xj : j <i}. 

Clearly, the Case G is a special case of the Case M. When we do not 
especially declare the restriction to the Case G, we consider the Case M in 
principle. 

Let us list up some conditions which have the same fashion as those in Sec- 
tion 4.1. We suppose that there exists a parametric family of ci-dimensional 
vector- valued functions {S{-;9)]0 G 0} on W which satisfies the following 
conditions. Typically, they may be considered to be the derivatives of S{-;9) 
with respect to 9, that is, S{-;9) = {-^S{-;9), . . . , -^S{-;9))^ . The function 
A appearing in Bl and B2 may be ckosen to be common without loss of 
generality. 
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Bl. is a compact subset of M . There exists a measurable function A 
on R"^ such that at the true 9q€ Q, 

5(x; 9) - 5(x; ^o) = ^(x; Oof {9 - 9o) + A(x)e(x; 9, 9o), 

where sup^gK? \e{x;9,9o)\ = o{\\9 - 9o\\) as 9 ^ 9q. 

B2. There exists a measurable function A on such that 

sup|5(x;0)| < A(x); 
6»ee 

sup||5(x;0)||<A(x); 
6»ee 

fj^(x;/io) < A(x), c:= inf inf fj^fx; /i) > 0: 

\\S{x;9)- S{x;9)\\ < A{x)\\9 - 9'\\ y9,9'eQ; 
|cr2(x; h) - (t2(x; h')\ < A(x) dnih, h') V/i, h' G H. 

B3. The process {^i}i=i,2,... is ergodic under the true (6*0, /lo) in the sense 
that for q' = q and q + 1 there exists the invariant measure such that for 
every /ig/-integrable function / 



We also assume that 



f{xi, . . . ,Xq')lJ,qi{dxi ■ ■■dXq'). 



/ A(x)^^<;(dx) < oo, 

\xo\ + A{xi, . . . ,Xq)\'^fJ.q+l{dxodxl ■ ■ ■ dXq) < OO. 



B4. The matrix 



I{9^M 



0-2 (x; /lo) 



■/Xg(dx) 



is invertible. 

B5. The metric entropy condition for {H^dn) is satisfied: 



j log N{H, dH,e)de < oo. 



B6. For every e > 0, 



inf 

6»-6lo||>£ 



S{x;9) 
0-2 (x; ho) 



[S{x;9o)-S{x;9)]fiq{dx] 



>0. 



See the end of Section 4.1 for the discussion of the choice of [H^dn)- 
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5.2. Results. In order to explain the idea of our estimating function, 
let us first consider the Case G. We denote by Pn,u the distribution of 
{Xi, . . . , Xji} under 9 = 9q + n~^/'^u and h = ho, where u G M*^. By an easy 
computation, the log- likelihood ratio is given by 

log^(Xi,...,XO 

A 1 



(9) 



where 



^2a2(Xi;/io) 

X {\Xi - S{Xf,eo + n-i/2^)|2 - \Xi - S{Xi;eo)\^} 



n 



^n,u = E -Z^^^F-T-^{^^ - S{X,;eo)}{S{Xf,eo + n-^/^u) - 5(X,; ^o)} 



^ o-2(Xi;/io) 



and 



2o-2(Xi;/io) 

Under the above regularity conditions, we have 

A„,«-iAA(0,ii^/(6'o,/io)^i) and Bn,u-^ I{Oq,K)u. 

So it follows from the theory of the local asymptotic normality that the 
distribution of the asymptotically efficient bound in the Case G when /iq is 
known is AA(O,/(0O)^o) )• That is, if we obtain an estimator 0„ such that 

\/n(9n — 9q) M{0, I{9o,ho)~^), it is asymptotically efficient in the sense 
of the local asymptotic minimax theorem. [See, e.g.. Chapter 3.11 of van 
der Vaart and Wellner (1996).] If the parameter h is unknown, then the 
estimation problem for 9 becomes more difficult. So if we have an estimator 
which asymptotically behaves as stated above, then we may say that it is 
asymptotically efficient with the nuisance parameter h. This argument is not 
true in the Case M where the log- likelihood does not equal the formula (9), 
but we propose to use it for deriving an estimating equation which yields 
the same asymptotic distribution as the Case G. 

Not only in the Case G but also in the case M, differentiating (9) formally, 
and replacing the true Iiq by the unknown parameter h, we propose the 
estimating function 
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Its compensator is 

n (J ii) 

Thus, it holds that 

which is the summation of a Cp{Q x -valued martingale difference ar- 
ray where /O = || • || V dn- By using Jain-Marcus' central limit theorem for 
martingales given by Nishiyama (1996, 2000a, 2000b), we have the following 
lemma which plays a key role in our approach. 

Lemma 5.1. Under B1~B3 and B5, the sequence of random fields y/n{^n — 
^n), with parameter {0,h), converges weakly in Cp{Q x H) to a zero-mean 
Gaussian random field Z with the covariance 

EZie,h)Zi9',h'f= [ ^^p^^§^j%a\^M)l^M^). 
JM.g cj^(x; /i)o-^(x; h') 

In particular, the random variable Z{9q, /iq) is distributed withM{0, I{Oq, /iq))- 
Another lemma which is necessary to apply Theorem 2.1 is the following. 



Lemma 5.2. Under B1-B4, for any random sequence {9n,hn) such that 
ll^n ~ ^oll V dnihn, ho) = op* (1), it holds that 

^n{On,hn) " ^n{Oo,ho) - {-I{eoMWn " ^o) =OP*(||en " ^oll)- 

Noting also that ^„(0O;^o) =0, we can apply Theorem 2.1 to conclude 
the following theorem. 

Theorem 5.3. Under B1-B5, for any random sequence {9n,hn) such 
that 

ll^'n - 6*011 = op* (1), dH{hn,ho) = op*{l) and ^^(fln, = op. (n"^/^), 

the estimator On is asymptotically normal: 

V^{9n-9o)^Ar{0,I{9o,hor^). 

In particular, in the Case G, the estimator 9n is asymptotically efficient. 
When B6 is also satisfied, the assumption "||0n — ^o|| =op*(l)" is automat- 
ically satisfied. 
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By the same reason as in Section 4^ it is necessary to develop a procedure 
to construct a consistent estimator hn for the nuisance parameter h ^ H . 
The following theorem gives us an answer. 

Theorem 5.4. Assume B2, B3 and B5. 

(First step: initial estimator for ^o-) Suppose the identifiability condition 
inf / |5(x;0) -5(x;0o)lVg(t^x) >0 Ve > 

6»: ||6l-6»o||>e JR9 

is satisfied. If a random sequence 9^^ satisfies An{0^^) < inio^QAniO) + 
op*(l), where 

1 

^nW = -V|Xi-5(X,;0)|2, 
n ^ 



i=l 



then it holds that — ^oll = op.(l). 

(Second step: consistent estimator for ho.) Suppose the identifiability con- 
dition 

inf / |f72(x;/i) -cr2(x;/io)|Vg('^x) >0 Ve > 

is satisfied. Merely by a technical reason, assume that there exists a constant 
L4 > such that E[wf\J-i-i] < L4 almost surely for all i. Using 9^^^ as above, 
we define 

1 " 

Bn{h) = -Ell^^ - 5(X.;e^^)P -a2(X,;/i)|l 



n 



i=l 



If a random sequence hn satisfies Bnihn) < inf/igi^ B„(/i) + op*{l), then it 
holds that dnihnjho) =op*(l). 

5.3. Proofs. 

Proof of Lemma 5.1. To show the finite-dimensional convergence is 
easy. Notice that 

S{x;9) 5(x;0') 



cj2(x;/i) cj2(x;/i') 

|S(x;0)-5(x;0O 



< 



(10) 



cj2(x;/i) 



+ S(x,;^') 



a2(x;/i) (j2(x;/i') 



< 



< 



A(x) 



+ 



A(x) 



c 

A(x) A(x)2 • 



■dHih,h') 



+ 



\9-9'\\y dH{h,h')). 
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The assertion follows from Proposition 4.5 of Nishiyama (2000a). [Or, see 
Nishiyama (1996) which is easier to read.] □ 

Proof of Lemma 5.2. Notice that $„(0o,^o) =0. For any (possibly 
random) sequence {On, hn) converging in outer probability to {9q, ho), it holds 
that 



(11) = - E 4$#T^(X^; ^0)(^0 - 9n) + Op^iWOn - 9o\\) 

n (J {^i, iln) 

= - E ^S^'^yA y.c, OoWo - 9n) + op,{\\9n -9o\\) 

= -1(60, hoWn - Oo) + Op.{\\en - OoW). 

To show (11) above, do the same argument as the proof of Lemma 4.2 using 
(10) instead of (6). □ 

Proof of Theorem 5.3. The assertions follow from Theorems 2.1 and 
2.3 by using also Lemmas 5.1 and 5.2, and Theorem 3.1(ii), respectively. □ 

Proof of Theorem 5.4. To prove the first step, we will apply Corol- 
lary 3.2.3 of van der Vaart and Wellner (1996). We can write An{0) = 
Ti,n + T2,n{d) + Ts^niO) where 



1 " 



Ti n — — / ^ — S^X. 



n . , 
1=1 



1 " 

r2,„(^) = - 5^(5(Xi; 0o) - S{y.i;Q))a{X.,- ho)wi, 
1 " 

Til . 
1=1 

The term Ti^n converges in probability to a constant Ci. On the other hand, 
by using Proposition 4.5 of Nishiyama (2000a), we have that \/nT2^n con- 
verges weakly in C(0) to a tight law, thus, sup^igg |T2^n(0)| converges in 
outer probability to zero. Finally, by Theorem 3.1, it holds that supgg@ |T3^.„(^) — 
^3(^)1 =op*(l) where 



r3(^)= / |5(x;eo)-5(x;0)|V(^^x). 
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Hence, we have sup^^g \AniO) — (Ci + r3(^))| = op*(l), and van der Vaart 
and Wellner's (1996) consistency theorem yields the assertion of the first 
step. 

To prove the second step, we shall apply Corollary 3.2.3 of van der Vaart 
and Wellner (1996) again. Let us first see that supf^^fj \13n{h) — i3„(/i)| = 
op*(l) where 



X \2Xi-S{Xi;6f:^)-S{Xi 



<L(Xi,Xi)||e^^ 



~ 1 

Now, notice that 

|||X, - S{Xi;e^^)\^-a\X,;h)\^ - \\Xi - S(Xi;eo)\^ - a\Xf,h)\' 
<\\Xi- 5(X,;^^^)|2 + \X, - S{Xi;9o)\^ - 2a\Xf,h)\ 

x\\Xi- 5(X,;e^^)|2 - \X, - S{Xi;eo)\^\ 
<\\Xi- 5(X,;e^^)|2 + \X, - 5(Xi;0o)|' - 2a^{Xi; h)\ 

m; ^o) 

x|5(Xi;^^^)-5(X,;^o)| 
h\\, 

where L{xo, xi, . . . , Xq) = C||xo| + A(a;i, . . . , Xq)\'^ for a constant C. Given any 
e > choose M > such that L{xq,xi, . . • , a::g)l{L(xo,xi,...,xg)>A/}/^<:/+i('^2;o, 
dxi , dxq) < e. Then we can write 

1 " 

sup \Bn{h) - Bn{h)\ < M||e^^ - ^oll + _ ^ L(Xi, Xi)l{i(x,,xo>M} diam(e). 
heH n .^^ 

The second term of the right-hand side converges to a positive constant 
which is smaller than e ■ diam(0). Since the choice of e > is arbitrary, we 
have supheH\l3nih) - Bnih)\ =op.(l). 

Now, we can write Bn{h) = Ti^„ + T2^„(/i) + T3^„(/i), where 

1 ^ 

T2,n{h) = -y^a\-K,M){wf - 1)(ct2(X„/io) -ct'(X,;/i)), 
1=1 

1 " 

T-i,n{h) = - E ^o) - <T'(Xi; h)\\ 

n 

1=1 

The term Ti^„ converges in probability to a constant Ci by assumption. 
By Jain-Marcus' CLT for martingale difference arrays, it is easy to show 
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that supfj^^fj \T2^n{h)\ =op*(l) (here, we use the technical assumption that 

_i] is bounded). Finally, by Theorem 3.1, it holds that supj^^fj 1^3, n 
T3{h) \ = op*(l) where 



Consequently, we have supf^^jj \Bn{h) — (Ci +T3(/i))| = op. (1). Therefore, 
the claim of the second step follows form van der Vaart and Wellner's (1996) 
consistency theorem. □ 
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